Popular Street Food classification¶

This dataset from kaggle is an image classification dataset primarily focused on identifying various popular street food items. The core problem it addresses is the automatic recognition and categorization of different street food dishes from images. This can be valuable for applications like building intelligent food recommendation systems, developing mobile apps for food identification, or even for culinary tourism platforms that help users discover local street food specialties.

source: https://www.kaggle.com/datasets/nikolasgegenava/popular-street-foods

In [1]:
import os
import math
import pandas as pd
import numpy as np
import keras
from keras import ops
from keras import layers
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB0, EfficientNetB4, ResNet50V2
import matplotlib.pyplot as plt
from PIL import Image
import random # Import the random module
2025-07-07 08:51:38.598064: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1751878299.555401   37962 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1751878299.795539   37962 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1751878301.584603   37962 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751878301.584690   37962 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751878301.584693   37962 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751878301.584694   37962 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-07-07 08:51:41.739526: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

Basic EDA¶

The dataset comes with a stats file listing 20 street food labels along with image count, avg_width, avg_height, min_height, max_height, formats and corrupt files

In [2]:
data_stats = pd.read_csv("dataset_stats.csv")
data_stats
Out[2]:
class image_count avg_width avg_height min_width min_height max_width max_height formats corrupt_files
0 churros 200 135 127 88 57 162 140 jpeg 0
1 gelato 200 145 122 78 55 162 140 jpeg, png 0
2 currywurst 200 142 121 54 63 162 140 jpeg 0
3 crepes 200 134 128 93 79 162 140 jpeg 0
4 hot_dog 200 143 121 89 54 162 140 jpeg 0
5 shawarma 200 133 129 78 85 162 140 jpeg 0
6 fish_and_chips 200 146 119 84 69 162 140 jpeg 0
7 pani_puri 199 133 127 78 77 162 140 jpeg 0
8 tacos 198 134 126 92 63 162 140 jpeg 0
9 poutine 196 140 122 91 65 162 140 jpeg 0
10 burger 195 146 121 92 50 162 140 jpeg 0
11 empanadas 180 138 125 89 68 162 140 jpeg 0
12 pretzel 179 130 131 81 73 162 140 jpeg 0
13 falafel 177 130 131 78 85 162 140 jpeg 0
14 kebab_(shish_kebab) 175 137 125 63 59 162 140 jpeg, png 0
15 pizza_slice 174 146 125 78 72 162 140 jpeg, png 0
16 bánh_mì 160 147 120 70 63 162 140 jpeg 0
17 arepas 160 144 121 93 68 162 140 jpeg 0
18 pad_thai 160 130 130 93 79 162 140 jpeg 0
19 samosas 158 129 131 78 90 162 140 jpeg 0

Visual inspection of samples¶

In [3]:
# Define the base data directory
data_dir = "data"

# Dictionary to store one random sample image path per class name (folder name)
sample_images_per_class = {}

# Get a list of all items in the data_dir
# We are looking for subdirectories which represent the class names
class_folders = [
    f for f in os.listdir(data_dir)
    if os.path.isdir(os.path.join(data_dir, f))
]

if not class_folders:
    print(f"Error: No class-named subfolders found directly in '{data_dir}'.")
    print("Expected structure: data/CLASS_NAME_1/, data/CLASS_NAME_2/, etc.")
else:
    for class_name in class_folders:
        class_folder_path = os.path.join(data_dir, class_name)
        
        # Collect all image paths in the current class folder
        images_in_folder = []
        for item_name in os.listdir(class_folder_path):
            item_path = os.path.join(class_folder_path, item_name)
            
            # Check if it's an image file
            if os.path.isfile(item_path) and item_name.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.tiff')):
                images_in_folder.append(item_path)
        
        if images_in_folder:
            # Randomly select one image from the collected list
            selected_image_path = random.choice(images_in_folder)
            sample_images_per_class[class_name] = selected_image_path
        else:
            print(f"Warning: No image files found in class folder '{class_name}'.")

    # --- Plotting the sample images ---
    if not sample_images_per_class:
        print("No sample images could be collected from any class folders.")
    else:
        print("\nDisplaying one random sample image for each class:")
        
        num_classes = len(sample_images_per_class)
        
        # Adjust grid size dynamically for better visualization
        cols = int(num_classes**0.5) if num_classes > 0 else 1
        if cols == 0: cols = 1 # Ensure cols is at least 1 for empty or single class case
        
        rows = (num_classes + cols - 1) // cols

        fig, axes = plt.subplots(rows, cols, figsize=(cols * 4, rows * 4)) # Adjust figsize as needed
        # Flatten axes array for easy iteration, even if it's a single subplot
        axes = axes.flatten() if num_classes > 1 else [axes] 

        # Sort class names for consistent plotting order
        sorted_class_names = sorted(sample_images_per_class.keys())

        for i, class_name in enumerate(sorted_class_names):
            if i >= len(axes): # Safety break if more classes than pre-allocated subplots
                break
            
            img_path = sample_images_per_class[class_name]
            try:
                img = Image.open(img_path)
                axes[i].imshow(img)
                axes[i].set_title(f"{class_name}", fontsize=10, pad=5) # Add padding
                axes[i].axis('off') # Hide axes ticks and labels
            except Exception as e:
                print(f"Error loading or plotting image '{img_path}' for class '{class_name}': {e}")
                axes[i].set_title(f"Error: {class_name}")
                axes[i].axis('off')

        # Hide any unused subplots
        for j in range(num_classes, len(axes)):
            fig.delaxes(axes[j])

        plt.tight_layout() # Adjust layout to prevent overlapping titles/labels
        plt.show()
Displaying one random sample image for each class:
No description has been provided for this image

Data preparation¶

In [4]:
# Defining parameters
# DATA
BUFFER_SIZE = 512
BATCH_SIZE = 32

# AUGMENTATION
IMG_WIDTH = 32
IMG_HEIGHT = 32
CHANNELS = 3
BATCH_SIZE = 64
NUM_CLASSES = 20

IMAGE_SIZE = IMG_WIDTH
PATCH_SIZE = 4
NUM_PATCHES = (IMAGE_SIZE // PATCH_SIZE) ** 2

# OPTIMIZER
LEARNING_RATE = 0.001
WEIGHT_DECAY = 0.0001

# TRAINING
EPOCHS = 50

# ARCHITECTURE
LAYER_NORM_EPS = 1e-6
TRANSFORMER_LAYERS = 12
PROJECTION_DIM = 64
NUM_HEADS = 4
TRANSFORMER_UNITS = [
    PROJECTION_DIM * 2,
    PROJECTION_DIM,
]
MLP_HEAD_UNITS = [2048, 1024]
In [5]:
train_ds, val_ds= tf.keras.preprocessing.image_dataset_from_directory("data", color_mode="rgb", subset="both", validation_split=0.1, seed=2025, image_size=(IMG_HEIGHT, IMG_WIDTH))
Found 3674 files belonging to 20 classes.
Using 3307 files for training.
Using 367 files for validation.
I0000 00:00:1751878342.090324   37962 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2242 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
In [6]:
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

Data Augmentation¶

Data augmentation is a pre-processing step which prevents overfitting and improve the generalization ability of machine learning models.

Here I am normalizing, fliping, rotating, zooming, adding contrast, shifting images (translating)

In [7]:
data_augmentation = keras.Sequential(
    [
        layers.Normalization(),
        layers.RandomFlip("horizontal_and_vertical"),
        layers.RandomRotation(factor=0.15),
        layers.RandomZoom(height_factor=0.2, width_factor=0.2),
        layers.RandomContrast(0.1),
        layers.RandomTranslation(0.1, 0.1),
    ],
    name="data_augmentation",
)

data_augmentation.layers[0].adapt(train_ds.map(lambda x, y: x))
2025-07-07 08:52:31.079096: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
In [8]:
# util function to plot history
def plot_training_history(history, plot_title):
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Training Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.title(f"{plot_title} accuracy")
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title(f"{plot_title} loss")
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()
In [9]:
# variable to hold results
results_df = []

Loss and metrics¶

All below models were trained using the AdamW optimizer, configured with a fixed learning_rate and a weight_decay.

For evaluating model performance, I used the SparseCategoricalCrossentropy loss function, which is suitable for multi-class classification problems. The primary metrics I monitored during training were accuracy (SparseCategoricalAccuracy), indicating the percentage of correctly classified samples, and top-3-accuracy (SparseTopKCategoricalAccuracy), showing the percentage of samples where the correct label was among the top three predicted classes.

Model building¶

Vision Transformer (with and without self attention)¶

The Vision Transformer (ViT) is a groundbreaking deep learning architecture that applies the Transformer model, originally developed for natural language processing (NLP), directly to image classification tasks.

The core idea behind ViT is to break down an image into smaller, non-overlapping patches and treat these patches as a sequence of "words" or "tokens". This allows the powerful self-attention mechanism of the Transformer to be applied to learn relationships between different parts of the image

Additionally, I also tried a variant of ViT with shifted patch tokenization.

The Vanilla Vision Transformer segments an image into discrete, non-overlapping patches, relying solely on self-attention to learn spatial relationships, which demands immense datasets for effective training due to its minimal inherent visual biases. Conversely, a Vision Transformer with shifted patch tokenization introduces overlapping patches through spatial shifting, injecting crucial local inductive biases. This modification allows the model to capture finer local details more efficiently, significantly improving its performance and reducing its reliance on colossal pre-training datasets compared to the vanilla version.

In [10]:
class ShiftedPatchTokenization(layers.Layer):
    def __init__(
        self,
        image_size=IMAGE_SIZE,
        patch_size=PATCH_SIZE,
        num_patches=NUM_PATCHES,
        projection_dim=PROJECTION_DIM,
        vanilla=False,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.vanilla = vanilla  # Flag to swtich to vanilla patch extractor
        self.image_size = image_size
        self.patch_size = patch_size
        self.half_patch = patch_size // 2
        self.flatten_patches = layers.Reshape((num_patches, -1))
        self.projection = layers.Dense(units=projection_dim)
        self.layer_norm = layers.LayerNormalization(epsilon=LAYER_NORM_EPS)

    def crop_shift_pad(self, images, mode):
        # Build the diagonally shifted images
        if mode == "left-up":
            crop_height = self.half_patch
            crop_width = self.half_patch
            shift_height = 0
            shift_width = 0
        elif mode == "left-down":
            crop_height = 0
            crop_width = self.half_patch
            shift_height = self.half_patch
            shift_width = 0
        elif mode == "right-up":
            crop_height = self.half_patch
            crop_width = 0
            shift_height = 0
            shift_width = self.half_patch
        else:
            crop_height = 0
            crop_width = 0
            shift_height = self.half_patch
            shift_width = self.half_patch

        # Crop the shifted images and pad them
        crop = ops.image.crop_images(
            images,
            top_cropping=crop_height,
            left_cropping=crop_width,
            target_height=self.image_size - self.half_patch,
            target_width=self.image_size - self.half_patch,
        )
        shift_pad = ops.image.pad_images(
            crop,
            top_padding=shift_height,
            left_padding=shift_width,
            target_height=self.image_size,
            target_width=self.image_size,
        )
        return shift_pad

    def call(self, images):
        if not self.vanilla:
            # Concat the shifted images with the original image
            images = ops.concatenate(
                [
                    images,
                    self.crop_shift_pad(images, mode="left-up"),
                    self.crop_shift_pad(images, mode="left-down"),
                    self.crop_shift_pad(images, mode="right-up"),
                    self.crop_shift_pad(images, mode="right-down"),
                ],
                axis=-1,
            )
        # Patchify the images and flatten it
        patches = ops.image.extract_patches(
            images=images,
            size=(self.patch_size, self.patch_size),
            strides=[1, self.patch_size, self.patch_size, 1],
            dilation_rate=1,
            padding="VALID",
        )
        flat_patches = self.flatten_patches(patches)
        if not self.vanilla:
            # Layer normalize the flat patches and linearly project it
            tokens = self.layer_norm(flat_patches)
            tokens = self.projection(tokens)
        else:
            # Linearly project the flat patches
            tokens = self.projection(flat_patches)
        return (tokens, patches)
In [11]:
class PatchEncoder(layers.Layer):
    def __init__(
        self, num_patches=NUM_PATCHES, projection_dim=PROJECTION_DIM, **kwargs
    ):
        super().__init__(**kwargs)
        self.num_patches = num_patches
        self.position_embedding = layers.Embedding(
            input_dim=num_patches, output_dim=projection_dim
        )
        self.positions = ops.arange(start=0, stop=self.num_patches, step=1)

    def call(self, encoded_patches):
        encoded_positions = self.position_embedding(self.positions)
        encoded_patches = encoded_patches + encoded_positions
        return encoded_patches
In [12]:
class MultiHeadAttentionLSA(layers.MultiHeadAttention):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # The trainable temperature term. The initial value is
        # the square root of the key dimension.
        self.tau = keras.Variable(math.sqrt(float(self._key_dim)), trainable=True)

    def _compute_attention(self, query, key, value, attention_mask=None, training=None, *args, **kwargs):
        query = ops.multiply(query, 1.0 / self.tau)
        attention_scores = ops.einsum(self._dot_product_equation, key, query)
        attention_scores = self._masked_softmax(attention_scores, attention_mask)
        attention_scores_dropout = self._dropout_layer(
            attention_scores, training=training
        )
        attention_output = ops.einsum(
            self._combine_equation, attention_scores_dropout, value
        )
        return attention_output, attention_scores
In [13]:
def mlp(x, hidden_units, dropout_rate):
    for units in hidden_units:
        x = layers.Dense(units, activation="gelu")(x)
        x = layers.Dropout(dropout_rate)(x)
    return x


# Build the diagonal attention mask
diag_attn_mask = 1 - ops.eye(NUM_PATCHES)
diag_attn_mask = ops.cast([diag_attn_mask], dtype="int8")
In [14]:
def create_vit_classifier(vanilla=False):
    inputs = layers.Input(shape=(IMG_HEIGHT, IMG_WIDTH, CHANNELS))
    # Augment data.
    augmented = data_augmentation(inputs)
    # Create patches.
    (tokens, _) = ShiftedPatchTokenization(vanilla=vanilla)(augmented)
    # Encode patches.
    encoded_patches = PatchEncoder()(tokens)

    # Create multiple layers of the Transformer block.
    for _ in range(TRANSFORMER_LAYERS):
        # Layer normalization 1.
        x1 = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
        # Create a multi-head attention layer.
        if not vanilla:
            attention_output = MultiHeadAttentionLSA(
                num_heads=NUM_HEADS, key_dim=PROJECTION_DIM, dropout=0.1
            )(x1, x1, attention_mask=diag_attn_mask)
        else:
            attention_output = layers.MultiHeadAttention(
                num_heads=NUM_HEADS, key_dim=PROJECTION_DIM, dropout=0.1
            )(x1, x1)
        # Skip connection 1.
        x2 = layers.Add()([attention_output, encoded_patches])
        # Layer normalization 2.
        x3 = layers.LayerNormalization(epsilon=1e-6)(x2)
        # MLP.
        x3 = mlp(x3, hidden_units=TRANSFORMER_UNITS, dropout_rate=0.1)
        # Skip connection 2.
        encoded_patches = layers.Add()([x3, x2])

    # Create a [batch_size, projection_dim] tensor.
    representation = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
    representation = layers.Flatten()(representation)
    representation = layers.Dropout(0.5)(representation)
    # Add MLP.
    features = mlp(representation, hidden_units=MLP_HEAD_UNITS, dropout_rate=0.5)
    # Classify outputs.
    logits = layers.Dense(NUM_CLASSES)(features)
    # Create the Keras model.
    model = keras.Model(inputs=inputs, outputs=logits)
    return model
In [15]:
# callbacks
early_stop_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)
In [16]:
def train_vit_model(model, epochs):

    optimizer = keras.optimizers.AdamW(
        learning_rate=LEARNING_RATE, weight_decay=WEIGHT_DECAY
    )

    model.compile(
        optimizer=optimizer,
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[
            keras.metrics.SparseCategoricalAccuracy(name="accuracy"),
            keras.metrics.SparseTopKCategoricalAccuracy(3, name="top-3-accuracy"),
        ],
    )

    history = model.fit(
        train_ds,           # Directly pass the tf.data.Dataset
        validation_data=val_ds, # Directly pass the tf.data.Dataset
        epochs=epochs,
        callbacks = [early_stop_callback, reduce_lr_callback],
        validation_split=0.2
    )

    return history
In [17]:
vit_model = create_vit_classifier(vanilla=True)
vanila_vit_history = train_vit_model(vit_model, 100)
/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/keras/src/layers/layer.py:421: UserWarning: `build()` was called on layer 'shifted_patch_tokenization', however the layer does not have a `build()` method implemented and it looks like it has unbuilt state. This will cause the layer to be marked as built, despite not being actually built, which may cause failures down the line. Make sure to implement a proper `build()` method.
  warnings.warn(
Epoch 1/100
I0000 00:00:1751878391.891798   38181 cuda_dnn.cc:529] Loaded cuDNN version 90300
104/104 ━━━━━━━━━━━━━━━━━━━━ 65s 238ms/step - accuracy: 0.0475 - loss: 4.3422 - top-3-accuracy: 0.1474 - val_accuracy: 0.0627 - val_loss: 3.1052 - val_top-3-accuracy: 0.1689 - learning_rate: 0.0010
Epoch 2/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.0525 - loss: 3.1075 - top-3-accuracy: 0.1565 - val_accuracy: 0.0845 - val_loss: 3.0059 - val_top-3-accuracy: 0.1635 - learning_rate: 0.0010
Epoch 3/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 205ms/step - accuracy: 0.0578 - loss: 3.0491 - top-3-accuracy: 0.1927 - val_accuracy: 0.0681 - val_loss: 2.9777 - val_top-3-accuracy: 0.1798 - learning_rate: 0.0010
Epoch 4/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 211ms/step - accuracy: 0.0772 - loss: 3.0242 - top-3-accuracy: 0.1830 - val_accuracy: 0.0899 - val_loss: 2.9482 - val_top-3-accuracy: 0.2125 - learning_rate: 0.0010
Epoch 5/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 201ms/step - accuracy: 0.0746 - loss: 3.0034 - top-3-accuracy: 0.1865 - val_accuracy: 0.0954 - val_loss: 2.9204 - val_top-3-accuracy: 0.2343 - learning_rate: 0.0010
Epoch 6/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 206ms/step - accuracy: 0.0828 - loss: 2.9845 - top-3-accuracy: 0.2064 - val_accuracy: 0.0681 - val_loss: 2.8991 - val_top-3-accuracy: 0.2452 - learning_rate: 0.0010
Epoch 7/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 20s 196ms/step - accuracy: 0.0913 - loss: 2.9467 - top-3-accuracy: 0.2114 - val_accuracy: 0.0899 - val_loss: 2.9196 - val_top-3-accuracy: 0.2616 - learning_rate: 0.0010
Epoch 8/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 209ms/step - accuracy: 0.0960 - loss: 2.9338 - top-3-accuracy: 0.2438 - val_accuracy: 0.1008 - val_loss: 2.8782 - val_top-3-accuracy: 0.2507 - learning_rate: 0.0010
Epoch 9/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 208ms/step - accuracy: 0.1056 - loss: 2.9200 - top-3-accuracy: 0.2463 - val_accuracy: 0.1362 - val_loss: 2.8588 - val_top-3-accuracy: 0.2779 - learning_rate: 0.0010
Epoch 10/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 199ms/step - accuracy: 0.1060 - loss: 2.8734 - top-3-accuracy: 0.2755 - val_accuracy: 0.1253 - val_loss: 2.9656 - val_top-3-accuracy: 0.2371 - learning_rate: 0.0010
Epoch 11/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 213ms/step - accuracy: 0.1170 - loss: 2.8476 - top-3-accuracy: 0.2847 - val_accuracy: 0.1008 - val_loss: 2.8513 - val_top-3-accuracy: 0.2970 - learning_rate: 0.0010
Epoch 12/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 206ms/step - accuracy: 0.1069 - loss: 2.8495 - top-3-accuracy: 0.2785 - val_accuracy: 0.1035 - val_loss: 2.8973 - val_top-3-accuracy: 0.2725 - learning_rate: 0.0010
Epoch 13/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 198ms/step - accuracy: 0.1159 - loss: 2.8432 - top-3-accuracy: 0.2869 - val_accuracy: 0.1199 - val_loss: 2.8180 - val_top-3-accuracy: 0.2943 - learning_rate: 0.0010
Epoch 14/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 208ms/step - accuracy: 0.1254 - loss: 2.7868 - top-3-accuracy: 0.3239 - val_accuracy: 0.0845 - val_loss: 2.9126 - val_top-3-accuracy: 0.2752 - learning_rate: 0.0010
Epoch 15/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.1351 - loss: 2.7672 - top-3-accuracy: 0.3316 - val_accuracy: 0.1199 - val_loss: 2.8766 - val_top-3-accuracy: 0.3106 - learning_rate: 0.0010
Epoch 16/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 209ms/step - accuracy: 0.1307 - loss: 2.7844 - top-3-accuracy: 0.3276 - val_accuracy: 0.1144 - val_loss: 2.8331 - val_top-3-accuracy: 0.2861 - learning_rate: 0.0010
Epoch 17/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 20s 194ms/step - accuracy: 0.1474 - loss: 2.7742 - top-3-accuracy: 0.3470 - val_accuracy: 0.1335 - val_loss: 2.8753 - val_top-3-accuracy: 0.3025 - learning_rate: 0.0010
Epoch 18/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.1595 - loss: 2.7382 - top-3-accuracy: 0.3409 - val_accuracy: 0.1335 - val_loss: 2.8440 - val_top-3-accuracy: 0.3079 - learning_rate: 0.0010
Epoch 19/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 200ms/step - accuracy: 0.1641 - loss: 2.6900 - top-3-accuracy: 0.3699 - val_accuracy: 0.1308 - val_loss: 2.8365 - val_top-3-accuracy: 0.3243 - learning_rate: 5.0000e-04
Epoch 20/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 211ms/step - accuracy: 0.1871 - loss: 2.6402 - top-3-accuracy: 0.4011 - val_accuracy: 0.1144 - val_loss: 2.8507 - val_top-3-accuracy: 0.3270 - learning_rate: 5.0000e-04
Epoch 21/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 206ms/step - accuracy: 0.1905 - loss: 2.6164 - top-3-accuracy: 0.4059 - val_accuracy: 0.1281 - val_loss: 2.8310 - val_top-3-accuracy: 0.3379 - learning_rate: 5.0000e-04
Epoch 22/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 204ms/step - accuracy: 0.1866 - loss: 2.6094 - top-3-accuracy: 0.4089 - val_accuracy: 0.1281 - val_loss: 2.8308 - val_top-3-accuracy: 0.3379 - learning_rate: 5.0000e-04
Epoch 23/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 198ms/step - accuracy: 0.1779 - loss: 2.6284 - top-3-accuracy: 0.4109 - val_accuracy: 0.1144 - val_loss: 2.8325 - val_top-3-accuracy: 0.3270 - learning_rate: 5.0000e-04
In [18]:
plot_training_history(vanila_vit_history, "Vanilla ViT Training and Validation")
val_loss, val_acc, val_top_3_acc = vit_model.evaluate(val_ds)
print(f"Val acc = {val_acc}, val loss = {val_loss}, val_top_3_acc = {val_top_3_acc}")
results_df.append({"model_type": "vanilla_vit", "best_training_accuracy": max(vanila_vit_history.history['accuracy']),
                   "validation_accuracy": val_acc, "validation_top_3":val_top_3_acc})
No description has been provided for this image
12/12 ━━━━━━━━━━━━━━━━━━━━ 0s 27ms/step - accuracy: 0.1205 - loss: 2.8149 - top-3-accuracy: 0.2835
Val acc = 0.11989101022481918, val loss = 2.817955493927002, val_top_3_acc = 0.29427793622016907
In [19]:
# Run experiments with the Shifted Patch Tokenization and
# Locality Self Attention modified ViT
vit_sl = create_vit_classifier(vanilla=False)
vit_sl_history = train_vit_model(vit_sl, 100)
Epoch 1/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 60s 238ms/step - accuracy: 0.0604 - loss: 4.2886 - top-3-accuracy: 0.1718 - val_accuracy: 0.0790 - val_loss: 2.9749 - val_top-3-accuracy: 0.2098 - learning_rate: 0.0010
Epoch 2/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.0715 - loss: 3.0851 - top-3-accuracy: 0.1834 - val_accuracy: 0.0817 - val_loss: 2.9648 - val_top-3-accuracy: 0.1962 - learning_rate: 0.0010
Epoch 3/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 216ms/step - accuracy: 0.0718 - loss: 3.0530 - top-3-accuracy: 0.2015 - val_accuracy: 0.0790 - val_loss: 2.9353 - val_top-3-accuracy: 0.2507 - learning_rate: 0.0010
Epoch 4/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 220ms/step - accuracy: 0.0791 - loss: 3.0117 - top-3-accuracy: 0.2042 - val_accuracy: 0.0872 - val_loss: 2.9426 - val_top-3-accuracy: 0.2153 - learning_rate: 0.0010
Epoch 5/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 199ms/step - accuracy: 0.0858 - loss: 2.9641 - top-3-accuracy: 0.2162 - val_accuracy: 0.1226 - val_loss: 2.8980 - val_top-3-accuracy: 0.2779 - learning_rate: 0.0010
Epoch 6/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 211ms/step - accuracy: 0.0892 - loss: 2.9504 - top-3-accuracy: 0.2337 - val_accuracy: 0.0845 - val_loss: 2.8909 - val_top-3-accuracy: 0.2752 - learning_rate: 0.0010
Epoch 7/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 219ms/step - accuracy: 0.0909 - loss: 2.9430 - top-3-accuracy: 0.2473 - val_accuracy: 0.1308 - val_loss: 2.8818 - val_top-3-accuracy: 0.2616 - learning_rate: 0.0010
Epoch 8/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.0853 - loss: 2.9309 - top-3-accuracy: 0.2440 - val_accuracy: 0.1063 - val_loss: 2.8617 - val_top-3-accuracy: 0.2834 - learning_rate: 0.0010
Epoch 9/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 211ms/step - accuracy: 0.0853 - loss: 2.9338 - top-3-accuracy: 0.2335 - val_accuracy: 0.0981 - val_loss: 2.8516 - val_top-3-accuracy: 0.2779 - learning_rate: 0.0010
Epoch 10/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 224ms/step - accuracy: 0.1064 - loss: 2.8968 - top-3-accuracy: 0.2504 - val_accuracy: 0.1172 - val_loss: 2.8264 - val_top-3-accuracy: 0.3052 - learning_rate: 0.0010
Epoch 11/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.1016 - loss: 2.8877 - top-3-accuracy: 0.2703 - val_accuracy: 0.1199 - val_loss: 2.8065 - val_top-3-accuracy: 0.3161 - learning_rate: 0.0010
Epoch 12/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 207ms/step - accuracy: 0.1080 - loss: 2.8855 - top-3-accuracy: 0.2761 - val_accuracy: 0.1117 - val_loss: 2.8557 - val_top-3-accuracy: 0.3079 - learning_rate: 0.0010
Epoch 13/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 220ms/step - accuracy: 0.1109 - loss: 2.8757 - top-3-accuracy: 0.2830 - val_accuracy: 0.1144 - val_loss: 2.7872 - val_top-3-accuracy: 0.3134 - learning_rate: 0.0010
Epoch 14/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 223ms/step - accuracy: 0.0983 - loss: 2.8680 - top-3-accuracy: 0.2650 - val_accuracy: 0.1526 - val_loss: 2.7983 - val_top-3-accuracy: 0.3460 - learning_rate: 0.0010
Epoch 15/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 200ms/step - accuracy: 0.1146 - loss: 2.8514 - top-3-accuracy: 0.2970 - val_accuracy: 0.1199 - val_loss: 2.8963 - val_top-3-accuracy: 0.2752 - learning_rate: 0.0010
Epoch 16/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.1115 - loss: 2.8567 - top-3-accuracy: 0.2933 - val_accuracy: 0.1253 - val_loss: 2.8348 - val_top-3-accuracy: 0.2997 - learning_rate: 0.0010
Epoch 17/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 205ms/step - accuracy: 0.1082 - loss: 2.8496 - top-3-accuracy: 0.2795 - val_accuracy: 0.1199 - val_loss: 2.8863 - val_top-3-accuracy: 0.2725 - learning_rate: 0.0010
Epoch 18/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.1125 - loss: 2.8760 - top-3-accuracy: 0.2763 - val_accuracy: 0.1499 - val_loss: 2.7769 - val_top-3-accuracy: 0.3433 - learning_rate: 0.0010
Epoch 19/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.1203 - loss: 2.8520 - top-3-accuracy: 0.2830 - val_accuracy: 0.1580 - val_loss: 2.7991 - val_top-3-accuracy: 0.3134 - learning_rate: 0.0010
Epoch 20/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 207ms/step - accuracy: 0.1149 - loss: 2.8605 - top-3-accuracy: 0.2852 - val_accuracy: 0.1471 - val_loss: 2.8006 - val_top-3-accuracy: 0.3161 - learning_rate: 0.0010
Epoch 21/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.1308 - loss: 2.8088 - top-3-accuracy: 0.3167 - val_accuracy: 0.1226 - val_loss: 2.7711 - val_top-3-accuracy: 0.3406 - learning_rate: 0.0010
Epoch 22/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 222ms/step - accuracy: 0.1124 - loss: 2.8264 - top-3-accuracy: 0.2943 - val_accuracy: 0.1281 - val_loss: 2.7494 - val_top-3-accuracy: 0.3243 - learning_rate: 0.0010
Epoch 23/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 202ms/step - accuracy: 0.1162 - loss: 2.8208 - top-3-accuracy: 0.2851 - val_accuracy: 0.1417 - val_loss: 2.7491 - val_top-3-accuracy: 0.3597 - learning_rate: 0.0010
Epoch 24/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 221ms/step - accuracy: 0.1294 - loss: 2.8028 - top-3-accuracy: 0.3059 - val_accuracy: 0.1635 - val_loss: 2.7487 - val_top-3-accuracy: 0.3542 - learning_rate: 0.0010
Epoch 25/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 216ms/step - accuracy: 0.1279 - loss: 2.7987 - top-3-accuracy: 0.2973 - val_accuracy: 0.1226 - val_loss: 2.7875 - val_top-3-accuracy: 0.3270 - learning_rate: 0.0010
Epoch 26/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 216ms/step - accuracy: 0.1435 - loss: 2.7930 - top-3-accuracy: 0.3179 - val_accuracy: 0.0954 - val_loss: 2.7959 - val_top-3-accuracy: 0.3215 - learning_rate: 0.0010
Epoch 27/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 213ms/step - accuracy: 0.1197 - loss: 2.8265 - top-3-accuracy: 0.2960 - val_accuracy: 0.1335 - val_loss: 2.7820 - val_top-3-accuracy: 0.3651 - learning_rate: 0.0010
Epoch 28/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 223ms/step - accuracy: 0.1204 - loss: 2.7983 - top-3-accuracy: 0.3161 - val_accuracy: 0.1417 - val_loss: 2.7590 - val_top-3-accuracy: 0.3188 - learning_rate: 0.0010
Epoch 29/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 221ms/step - accuracy: 0.1321 - loss: 2.7696 - top-3-accuracy: 0.3344 - val_accuracy: 0.1608 - val_loss: 2.7251 - val_top-3-accuracy: 0.3869 - learning_rate: 0.0010
Epoch 30/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 215ms/step - accuracy: 0.1433 - loss: 2.7658 - top-3-accuracy: 0.3344 - val_accuracy: 0.1390 - val_loss: 2.6972 - val_top-3-accuracy: 0.3678 - learning_rate: 0.0010
Epoch 31/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 219ms/step - accuracy: 0.1511 - loss: 2.7686 - top-3-accuracy: 0.3412 - val_accuracy: 0.1335 - val_loss: 2.7743 - val_top-3-accuracy: 0.3297 - learning_rate: 0.0010
Epoch 32/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 217ms/step - accuracy: 0.1559 - loss: 2.7718 - top-3-accuracy: 0.3436 - val_accuracy: 0.1635 - val_loss: 2.7810 - val_top-3-accuracy: 0.3406 - learning_rate: 0.0010
Epoch 33/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 208ms/step - accuracy: 0.1432 - loss: 2.7599 - top-3-accuracy: 0.3485 - val_accuracy: 0.1417 - val_loss: 2.7017 - val_top-3-accuracy: 0.3706 - learning_rate: 0.0010
Epoch 34/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 214ms/step - accuracy: 0.1551 - loss: 2.7441 - top-3-accuracy: 0.3525 - val_accuracy: 0.1199 - val_loss: 2.7954 - val_top-3-accuracy: 0.3188 - learning_rate: 0.0010
Epoch 35/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 222ms/step - accuracy: 0.1549 - loss: 2.7202 - top-3-accuracy: 0.3526 - val_accuracy: 0.1526 - val_loss: 2.6840 - val_top-3-accuracy: 0.3406 - learning_rate: 0.0010
Epoch 36/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 206ms/step - accuracy: 0.1679 - loss: 2.7341 - top-3-accuracy: 0.3559 - val_accuracy: 0.1499 - val_loss: 2.7264 - val_top-3-accuracy: 0.3460 - learning_rate: 0.0010
Epoch 37/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 218ms/step - accuracy: 0.1536 - loss: 2.7197 - top-3-accuracy: 0.3493 - val_accuracy: 0.1444 - val_loss: 2.6592 - val_top-3-accuracy: 0.4060 - learning_rate: 0.0010
Epoch 38/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 220ms/step - accuracy: 0.1551 - loss: 2.7172 - top-3-accuracy: 0.3662 - val_accuracy: 0.1499 - val_loss: 2.7192 - val_top-3-accuracy: 0.3624 - learning_rate: 0.0010
Epoch 39/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 211ms/step - accuracy: 0.1494 - loss: 2.7004 - top-3-accuracy: 0.3564 - val_accuracy: 0.1253 - val_loss: 2.7453 - val_top-3-accuracy: 0.3706 - learning_rate: 0.0010
Epoch 40/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 220ms/step - accuracy: 0.1547 - loss: 2.6906 - top-3-accuracy: 0.3519 - val_accuracy: 0.1717 - val_loss: 2.6780 - val_top-3-accuracy: 0.3706 - learning_rate: 0.0010
Epoch 41/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 24s 227ms/step - accuracy: 0.1576 - loss: 2.6916 - top-3-accuracy: 0.3567 - val_accuracy: 0.1771 - val_loss: 2.7112 - val_top-3-accuracy: 0.3651 - learning_rate: 0.0010
Epoch 42/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 213ms/step - accuracy: 0.1579 - loss: 2.6976 - top-3-accuracy: 0.3702 - val_accuracy: 0.1417 - val_loss: 2.6720 - val_top-3-accuracy: 0.3815 - learning_rate: 0.0010
Epoch 43/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 211ms/step - accuracy: 0.1846 - loss: 2.6253 - top-3-accuracy: 0.3962 - val_accuracy: 0.1689 - val_loss: 2.6471 - val_top-3-accuracy: 0.4196 - learning_rate: 5.0000e-04
Epoch 44/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 221ms/step - accuracy: 0.1836 - loss: 2.6117 - top-3-accuracy: 0.4109 - val_accuracy: 0.1717 - val_loss: 2.6353 - val_top-3-accuracy: 0.4005 - learning_rate: 5.0000e-04
Epoch 45/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 24s 228ms/step - accuracy: 0.1838 - loss: 2.6044 - top-3-accuracy: 0.4254 - val_accuracy: 0.1635 - val_loss: 2.6319 - val_top-3-accuracy: 0.4060 - learning_rate: 5.0000e-04
Epoch 46/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.1994 - loss: 2.5977 - top-3-accuracy: 0.4204 - val_accuracy: 0.1826 - val_loss: 2.6899 - val_top-3-accuracy: 0.4087 - learning_rate: 5.0000e-04
Epoch 47/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.1978 - loss: 2.5790 - top-3-accuracy: 0.4313 - val_accuracy: 0.1580 - val_loss: 2.6980 - val_top-3-accuracy: 0.3733 - learning_rate: 5.0000e-04
Epoch 48/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 212ms/step - accuracy: 0.1953 - loss: 2.5713 - top-3-accuracy: 0.4335 - val_accuracy: 0.1771 - val_loss: 2.6281 - val_top-3-accuracy: 0.3978 - learning_rate: 5.0000e-04
Epoch 49/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 215ms/step - accuracy: 0.2120 - loss: 2.5583 - top-3-accuracy: 0.4347 - val_accuracy: 0.1580 - val_loss: 2.6608 - val_top-3-accuracy: 0.3842 - learning_rate: 5.0000e-04
Epoch 50/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 216ms/step - accuracy: 0.2012 - loss: 2.5561 - top-3-accuracy: 0.4253 - val_accuracy: 0.1907 - val_loss: 2.6299 - val_top-3-accuracy: 0.3869 - learning_rate: 5.0000e-04
Epoch 51/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 222ms/step - accuracy: 0.2100 - loss: 2.5481 - top-3-accuracy: 0.4452 - val_accuracy: 0.1662 - val_loss: 2.6431 - val_top-3-accuracy: 0.4387 - learning_rate: 5.0000e-04
Epoch 52/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 204ms/step - accuracy: 0.2044 - loss: 2.5509 - top-3-accuracy: 0.4448 - val_accuracy: 0.1744 - val_loss: 2.6362 - val_top-3-accuracy: 0.4114 - learning_rate: 5.0000e-04
Epoch 53/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 215ms/step - accuracy: 0.1965 - loss: 2.5187 - top-3-accuracy: 0.4493 - val_accuracy: 0.1662 - val_loss: 2.6216 - val_top-3-accuracy: 0.4223 - learning_rate: 5.0000e-04
Epoch 54/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 220ms/step - accuracy: 0.2274 - loss: 2.5185 - top-3-accuracy: 0.4644 - val_accuracy: 0.1826 - val_loss: 2.6692 - val_top-3-accuracy: 0.4169 - learning_rate: 5.0000e-04
Epoch 55/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 206ms/step - accuracy: 0.2243 - loss: 2.5088 - top-3-accuracy: 0.4850 - val_accuracy: 0.1608 - val_loss: 2.6534 - val_top-3-accuracy: 0.4142 - learning_rate: 5.0000e-04
Epoch 56/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 215ms/step - accuracy: 0.2205 - loss: 2.5340 - top-3-accuracy: 0.4447 - val_accuracy: 0.1744 - val_loss: 2.6291 - val_top-3-accuracy: 0.4278 - learning_rate: 5.0000e-04
Epoch 57/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 213ms/step - accuracy: 0.2204 - loss: 2.5138 - top-3-accuracy: 0.4792 - val_accuracy: 0.2180 - val_loss: 2.5962 - val_top-3-accuracy: 0.4360 - learning_rate: 5.0000e-04
Epoch 58/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 222ms/step - accuracy: 0.2276 - loss: 2.4943 - top-3-accuracy: 0.4788 - val_accuracy: 0.2153 - val_loss: 2.6192 - val_top-3-accuracy: 0.4305 - learning_rate: 5.0000e-04
Epoch 59/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 216ms/step - accuracy: 0.2266 - loss: 2.4853 - top-3-accuracy: 0.4696 - val_accuracy: 0.2262 - val_loss: 2.5099 - val_top-3-accuracy: 0.4741 - learning_rate: 5.0000e-04
Epoch 60/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 204ms/step - accuracy: 0.2137 - loss: 2.5061 - top-3-accuracy: 0.4794 - val_accuracy: 0.2071 - val_loss: 2.6225 - val_top-3-accuracy: 0.4169 - learning_rate: 5.0000e-04
Epoch 61/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 217ms/step - accuracy: 0.2106 - loss: 2.4847 - top-3-accuracy: 0.4691 - val_accuracy: 0.2234 - val_loss: 2.5557 - val_top-3-accuracy: 0.4387 - learning_rate: 5.0000e-04
Epoch 62/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 221ms/step - accuracy: 0.2396 - loss: 2.4639 - top-3-accuracy: 0.4816 - val_accuracy: 0.1989 - val_loss: 2.6070 - val_top-3-accuracy: 0.4278 - learning_rate: 5.0000e-04
Epoch 63/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 215ms/step - accuracy: 0.2280 - loss: 2.4833 - top-3-accuracy: 0.4766 - val_accuracy: 0.2071 - val_loss: 2.6034 - val_top-3-accuracy: 0.4414 - learning_rate: 5.0000e-04
Epoch 64/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 215ms/step - accuracy: 0.2148 - loss: 2.4856 - top-3-accuracy: 0.4776 - val_accuracy: 0.2071 - val_loss: 2.5832 - val_top-3-accuracy: 0.4414 - learning_rate: 5.0000e-04
Epoch 65/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 223ms/step - accuracy: 0.2474 - loss: 2.4486 - top-3-accuracy: 0.4973 - val_accuracy: 0.2153 - val_loss: 2.5738 - val_top-3-accuracy: 0.4414 - learning_rate: 2.5000e-04
Epoch 66/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 210ms/step - accuracy: 0.2543 - loss: 2.4252 - top-3-accuracy: 0.5000 - val_accuracy: 0.2153 - val_loss: 2.5582 - val_top-3-accuracy: 0.4523 - learning_rate: 2.5000e-04
Epoch 67/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 23s 219ms/step - accuracy: 0.2666 - loss: 2.3896 - top-3-accuracy: 0.5086 - val_accuracy: 0.2071 - val_loss: 2.5503 - val_top-3-accuracy: 0.4441 - learning_rate: 2.5000e-04
Epoch 68/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 22s 214ms/step - accuracy: 0.2354 - loss: 2.4255 - top-3-accuracy: 0.4973 - val_accuracy: 0.2098 - val_loss: 2.5434 - val_top-3-accuracy: 0.4605 - learning_rate: 2.5000e-04
Epoch 69/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 21s 205ms/step - accuracy: 0.2464 - loss: 2.3909 - top-3-accuracy: 0.5156 - val_accuracy: 0.2262 - val_loss: 2.5587 - val_top-3-accuracy: 0.4414 - learning_rate: 2.5000e-04
In [20]:
plot_training_history(vit_sl_history, "ViT(self attention) Training and Validation")
val_loss, val_acc, val_top_3_acc = vit_sl.evaluate(val_ds)
print(f"Val acc = {val_acc}, val loss = {val_loss}, val_top_3_acc = {val_top_3_acc}")
results_df.append({"model_type": "self_attention_vit", "best_training_accuracy": max(vit_sl_history.history['accuracy']), 
                   "validation_accuracy": val_acc, "validation_top_3":val_top_3_acc})
No description has been provided for this image
12/12 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step - accuracy: 0.2548 - loss: 2.4530 - top-3-accuracy: 0.4949
Val acc = 0.2261580377817154, val loss = 2.5098989009857178, val_top_3_acc = 0.47411444783210754

CNN based model¶

CNN is most robust and widely used technique for image classification.

A Convolutional Neural Network (CNN) works by using specialized layers called convolutional layers to automatically learn patterns directly from images. These layers act as a collection of digital filters that slide across the image, identifying specific features like edges, textures, or corners. A key advantage is that these same filters are applied uniformly across the entire image, enabling the network to recognize a pattern regardless of its position, a concept known as translation equivariance. Between these filtering steps, pooling layers condense the information, making the network more robust to slight variations in the image while also reducing its complexity. Through multiple such layers, a CNN builds a hierarchical understanding: starting with simple, low-level features and progressively combining them to form highly abstract, high-level representations. This design makes CNNs exceptionally effective for tasks requiring visual understanding, as they naturally align with the structural properties of image data.

In [21]:
# Model configuration
NUM_CLASSES = 20
DROPOUT_RATE = 0.5
L2_REG = 0.001
INITIAL_LR = 0.001

def build_deep_cnn_model(input_shape, num_classes):
    """
    Build a deep CNN with regularization
    """
    model = tf.keras.models.Sequential([
        # Input layer
        layers.InputLayer(input_shape=input_shape),

        # First convolutional block
        layers.Conv2D(64, (3, 3), activation='relu', padding='same',
                     kernel_regularizer=tf.keras.regularizers.l2(L2_REG)),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(DROPOUT_RATE/2),

        # Second convolutional block
        layers.Conv2D(128, (3, 3), activation='relu', padding='same',
                     kernel_regularizer=tf.keras.regularizers.l2(L2_REG)),
        layers.BatchNormalization(),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(DROPOUT_RATE),

        # Third convolutional block (deeper)
        layers.Conv2D(256, (3, 3), activation='relu', padding='same',
                     kernel_regularizer=tf.keras.regularizers.l2(L2_REG)),
        layers.BatchNormalization(),
        layers.Conv2D(256, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(DROPOUT_RATE),

        # Fourth convolutional block
        layers.Conv2D(512, (3, 3), activation='relu', padding='same',
                     kernel_regularizer=tf.keras.regularizers.l2(L2_REG)),
        layers.BatchNormalization(),
        layers.Conv2D(512, (3, 3), activation='relu', padding='same'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(DROPOUT_RATE),

        # Flatten and dense layers
        layers.Flatten(),
        layers.Dense(1024, activation='relu',
                   kernel_regularizer=tf.keras.regularizers.l2(L2_REG)),
        layers.BatchNormalization(),
        layers.Dropout(DROPOUT_RATE),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(DROPOUT_RATE),

        # Output layer
        layers.Dense(num_classes, activation='softmax')
    ])

    return model
In [22]:
def train_cnn_model(model, epochs):
    # Compile the model
    optimizer = keras.optimizers.AdamW(
    learning_rate=LEARNING_RATE, weight_decay=WEIGHT_DECAY)

    model.compile(optimizer=optimizer,
                  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=[
                      keras.metrics.SparseCategoricalAccuracy(name="accuracy"),
                      keras.metrics.SparseTopKCategoricalAccuracy(3, name="top-3-accuracy"),])

    # Train the model
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=epochs,
        callbacks=[early_stop_callback, reduce_lr_callback]
    )

    return history
In [23]:
# Build the model
cnn_model = build_deep_cnn_model(input_shape=(IMG_HEIGHT, IMG_WIDTH, CHANNELS), num_classes=NUM_CLASSES)
/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/keras/src/layers/core/input_layer.py:27: UserWarning: Argument `input_shape` is deprecated. Use `shape` instead.
  warnings.warn(
In [24]:
cnn_model_history = train_cnn_model(cnn_model, 100)
Epoch 1/100
/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/keras/src/backend/tensorflow/nn.py:717: UserWarning: "`sparse_categorical_crossentropy` received `from_logits=True`, but the `output` argument was produced by a Softmax activation and thus does not represent logits. Was this intended?
  output, from_logits = _get_logits(
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1751880477.205467   38183 service.cc:152] XLA service 0x7f05bc012bf0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1751880477.205570   38183 service.cc:160]   StreamExecutor device (0): NVIDIA GeForce GTX 1650, Compute Capability 7.5
2025-07-07 09:27:57.551929: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2025-07-07 09:28:00.123795: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.25 = (f32[32,64,32,32]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,64,32,32]{3,2,1,0} %bitcast.27082, f32[64,64,3,3]{3,2,1,0} %bitcast.24543, f32[64]{0} %bitcast.27144), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:00.471059: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.26 = (f32[32,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,64,16,16]{3,2,1,0} %bitcast.27357, f32[128,64,3,3]{3,2,1,0} %bitcast.24627, f32[128]{0} %bitcast.27420), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_2_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:00.599321: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.27 = (f32[32,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,128,16,16]{3,2,1,0} %bitcast.27615, f32[128,128,3,3]{3,2,1,0} %bitcast.24701, f32[128]{0} %bitcast.27677), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_3_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:00.723999: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.28 = (f32[32,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,128,8,8]{3,2,1,0} %bitcast.27890, f32[256,128,3,3]{3,2,1,0} %bitcast.24784, f32[256]{0} %bitcast.27953), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_4_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:00.858181: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.29 = (f32[32,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,256,8,8]{3,2,1,0} %bitcast.28148, f32[256,256,3,3]{3,2,1,0} %bitcast.24858, f32[256]{0} %bitcast.28210), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_5_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:01.003384: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.30 = (f32[32,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,256,4,4]{3,2,1,0} %bitcast.28423, f32[512,256,3,3]{3,2,1,0} %bitcast.24942, f32[512]{0} %bitcast.28486), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_6_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:01.166822: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.31 = (f32[32,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,512,4,4]{3,2,1,0} %bitcast.28681, f32[512,512,3,3]{3,2,1,0} %bitcast.25016, f32[512]{0} %bitcast.28743), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_7_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
I0000 00:00:1751880491.630373   38183 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
103/104 ━━━━━━━━━━━━━━━━━━━━ 0s 38ms/step - accuracy: 0.0590 - loss: 6.4037 - top-3-accuracy: 0.1725
2025-07-07 09:28:17.337108: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.25 = (f32[11,64,32,32]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,64,32,32]{3,2,1,0} %bitcast.27082, f32[64,64,3,3]{3,2,1,0} %bitcast.24543, f32[64]{0} %bitcast.27144), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:17.445967: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.26 = (f32[11,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,64,16,16]{3,2,1,0} %bitcast.27357, f32[128,64,3,3]{3,2,1,0} %bitcast.24627, f32[128]{0} %bitcast.27420), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_2_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:17.528953: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.27 = (f32[11,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,128,16,16]{3,2,1,0} %bitcast.27615, f32[128,128,3,3]{3,2,1,0} %bitcast.24701, f32[128]{0} %bitcast.27677), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_3_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:17.630389: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.28 = (f32[11,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,128,8,8]{3,2,1,0} %bitcast.27890, f32[256,128,3,3]{3,2,1,0} %bitcast.24784, f32[256]{0} %bitcast.27953), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_4_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:17.723411: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.29 = (f32[11,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,256,8,8]{3,2,1,0} %bitcast.28148, f32[256,256,3,3]{3,2,1,0} %bitcast.24858, f32[256]{0} %bitcast.28210), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_5_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:17.876003: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.30 = (f32[11,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,256,4,4]{3,2,1,0} %bitcast.28423, f32[512,256,3,3]{3,2,1,0} %bitcast.24942, f32[512]{0} %bitcast.28486), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_6_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:18.021742: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.31 = (f32[11,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[11,512,4,4]{3,2,1,0} %bitcast.28681, f32[512,512,3,3]{3,2,1,0} %bitcast.25016, f32[512]{0} %bitcast.28743), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_7_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
104/104 ━━━━━━━━━━━━━━━━━━━━ 0s 143ms/step - accuracy: 0.0590 - loss: 6.4020 - top-3-accuracy: 0.1728
2025-07-07 09:28:27.243247: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.25 = (f32[32,64,32,32]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,64,32,32]{3,2,1,0} %bitcast.1404, f32[64,64,3,3]{3,2,1,0} %bitcast.1411, f32[64]{0} %bitcast.1413), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:27.404529: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.26 = (f32[32,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,64,16,16]{3,2,1,0} %bitcast.1442, f32[128,64,3,3]{3,2,1,0} %bitcast.1449, f32[128]{0} %bitcast.1451), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_2_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:27.520322: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.27 = (f32[32,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,128,16,16]{3,2,1,0} %bitcast.1478, f32[128,128,3,3]{3,2,1,0} %bitcast.1485, f32[128]{0} %bitcast.1487), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_3_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:27.644742: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.28 = (f32[32,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,128,8,8]{3,2,1,0} %bitcast.1515, f32[256,128,3,3]{3,2,1,0} %bitcast.1522, f32[256]{0} %bitcast.1524), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_4_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:27.766693: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.29 = (f32[32,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,256,8,8]{3,2,1,0} %bitcast.1551, f32[256,256,3,3]{3,2,1,0} %bitcast.1558, f32[256]{0} %bitcast.1560), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_5_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:27.895780: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.30 = (f32[32,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,256,4,4]{3,2,1,0} %bitcast.1588, f32[512,256,3,3]{3,2,1,0} %bitcast.1595, f32[512]{0} %bitcast.1597), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_6_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:28.096669: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.31 = (f32[32,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,512,4,4]{3,2,1,0} %bitcast.1624, f32[512,512,3,3]{3,2,1,0} %bitcast.1631, f32[512]{0} %bitcast.1633), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_7_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.196200: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.25 = (f32[15,64,32,32]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,64,32,32]{3,2,1,0} %bitcast.1404, f32[64,64,3,3]{3,2,1,0} %bitcast.1411, f32[64]{0} %bitcast.1413), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_1_2/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.313694: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.26 = (f32[15,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,64,16,16]{3,2,1,0} %bitcast.1442, f32[128,64,3,3]{3,2,1,0} %bitcast.1449, f32[128]{0} %bitcast.1451), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_2_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.408962: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.27 = (f32[15,128,16,16]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,128,16,16]{3,2,1,0} %bitcast.1478, f32[128,128,3,3]{3,2,1,0} %bitcast.1485, f32[128]{0} %bitcast.1487), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_3_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.526734: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.28 = (f32[15,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,128,8,8]{3,2,1,0} %bitcast.1515, f32[256,128,3,3]{3,2,1,0} %bitcast.1522, f32[256]{0} %bitcast.1524), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_4_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.635736: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.29 = (f32[15,256,8,8]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,256,8,8]{3,2,1,0} %bitcast.1551, f32[256,256,3,3]{3,2,1,0} %bitcast.1558, f32[256]{0} %bitcast.1560), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_5_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.794925: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.30 = (f32[15,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,256,4,4]{3,2,1,0} %bitcast.1588, f32[512,256,3,3]{3,2,1,0} %bitcast.1595, f32[512]{0} %bitcast.1597), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_6_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
2025-07-07 09:28:30.945304: I external/local_xla/xla/service/gpu/autotuning/conv_algorithm_picker.cc:549] Omitted potentially buggy algorithm eng14{k25=0} for conv %cudnn-conv-bias-activation.31 = (f32[15,512,4,4]{3,2,1,0}, u8[0]{0}) custom-call(f32[15,512,4,4]{3,2,1,0} %bitcast.1624, f32[512,512,3,3]{3,2,1,0} %bitcast.1631, f32[512]{0} %bitcast.1633), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", metadata={op_type="Conv2D" op_name="sequential_1/conv2d_7_1/convolution" source_file="/home/manpreet/msds/deep_learning/repo/MSDS-5511-manpreet/final/.venv/lib/python3.12/site-packages/tensorflow/python/framework/ops.py" source_line=1200}, backend_config={"operation_queue_id":"0","wait_on_operation_queues":[],"cudnn_conv_backend_config":{"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0},"force_earliest_schedule":false}
104/104 ━━━━━━━━━━━━━━━━━━━━ 42s 198ms/step - accuracy: 0.0591 - loss: 6.4002 - top-3-accuracy: 0.1730 - val_accuracy: 0.0518 - val_loss: 14.6119 - val_top-3-accuracy: 0.1471 - learning_rate: 0.0010
Epoch 2/100
Epoch 2/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.0864 - loss: 5.8407 - top-3-accuracy: 0.2109 - val_accuracy: 0.1008 - val_loss: 5.7516 - val_top-3-accuracy: 0.2589 - learning_rate: 0.0010
Epoch 3/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.1107 - loss: 5.5010 - top-3-accuracy: 0.2680 - val_accuracy: 0.1090 - val_loss: 5.3988 - val_top-3-accuracy: 0.2779 - learning_rate: 0.0010
Epoch 4/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step - accuracy: 0.1343 - loss: 5.1742 - top-3-accuracy: 0.3259 - val_accuracy: 0.1362 - val_loss: 5.0932 - val_top-3-accuracy: 0.3134 - learning_rate: 0.0010
Epoch 5/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step - accuracy: 0.1577 - loss: 4.9379 - top-3-accuracy: 0.3608 - val_accuracy: 0.1471 - val_loss: 4.6976 - val_top-3-accuracy: 0.3787 - learning_rate: 0.0010
Epoch 6/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.1742 - loss: 4.7030 - top-3-accuracy: 0.3918 - val_accuracy: 0.1744 - val_loss: 4.5750 - val_top-3-accuracy: 0.4087 - learning_rate: 0.0010
Epoch 7/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step - accuracy: 0.2012 - loss: 4.4814 - top-3-accuracy: 0.4339 - val_accuracy: 0.1362 - val_loss: 4.6869 - val_top-3-accuracy: 0.3951 - learning_rate: 0.0010
Epoch 8/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.2298 - loss: 4.2896 - top-3-accuracy: 0.4707 - val_accuracy: 0.1253 - val_loss: 4.8076 - val_top-3-accuracy: 0.3406 - learning_rate: 0.0010
Epoch 9/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 42ms/step - accuracy: 0.2497 - loss: 4.1061 - top-3-accuracy: 0.4912 - val_accuracy: 0.1553 - val_loss: 4.8309 - val_top-3-accuracy: 0.3624 - learning_rate: 0.0010
Epoch 10/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.2545 - loss: 3.9499 - top-3-accuracy: 0.5225 - val_accuracy: 0.2207 - val_loss: 4.1164 - val_top-3-accuracy: 0.4496 - learning_rate: 0.0010
Epoch 11/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.2637 - loss: 3.8169 - top-3-accuracy: 0.5469 - val_accuracy: 0.1935 - val_loss: 4.4018 - val_top-3-accuracy: 0.3815 - learning_rate: 0.0010
Epoch 12/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 5s 53ms/step - accuracy: 0.2917 - loss: 3.6386 - top-3-accuracy: 0.5686 - val_accuracy: 0.1798 - val_loss: 4.5746 - val_top-3-accuracy: 0.3815 - learning_rate: 0.0010
Epoch 13/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 5s 46ms/step - accuracy: 0.3126 - loss: 3.5000 - top-3-accuracy: 0.6043 - val_accuracy: 0.2343 - val_loss: 3.8923 - val_top-3-accuracy: 0.4741 - learning_rate: 0.0010
Epoch 14/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step - accuracy: 0.3210 - loss: 3.4039 - top-3-accuracy: 0.6111 - val_accuracy: 0.2343 - val_loss: 4.0608 - val_top-3-accuracy: 0.4741 - learning_rate: 0.0010
Epoch 15/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 38ms/step - accuracy: 0.3512 - loss: 3.2937 - top-3-accuracy: 0.6345 - val_accuracy: 0.2316 - val_loss: 4.0256 - val_top-3-accuracy: 0.4687 - learning_rate: 0.0010
Epoch 16/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 38ms/step - accuracy: 0.3801 - loss: 3.1198 - top-3-accuracy: 0.6670 - val_accuracy: 0.2071 - val_loss: 4.2591 - val_top-3-accuracy: 0.4823 - learning_rate: 0.0010
Epoch 17/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.3979 - loss: 3.0141 - top-3-accuracy: 0.6920 - val_accuracy: 0.1935 - val_loss: 4.6317 - val_top-3-accuracy: 0.4469 - learning_rate: 0.0010
Epoch 18/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.4196 - loss: 2.9310 - top-3-accuracy: 0.6959 - val_accuracy: 0.1989 - val_loss: 4.5387 - val_top-3-accuracy: 0.4196 - learning_rate: 0.0010
Epoch 19/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 5s 47ms/step - accuracy: 0.4694 - loss: 2.7654 - top-3-accuracy: 0.7408 - val_accuracy: 0.2534 - val_loss: 3.7356 - val_top-3-accuracy: 0.5095 - learning_rate: 5.0000e-04
Epoch 20/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 42ms/step - accuracy: 0.5327 - loss: 2.5147 - top-3-accuracy: 0.7895 - val_accuracy: 0.2725 - val_loss: 3.8366 - val_top-3-accuracy: 0.4823 - learning_rate: 5.0000e-04
Epoch 21/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.5789 - loss: 2.3569 - top-3-accuracy: 0.8216 - val_accuracy: 0.2725 - val_loss: 3.8735 - val_top-3-accuracy: 0.5150 - learning_rate: 5.0000e-04
Epoch 22/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 39ms/step - accuracy: 0.5606 - loss: 2.3128 - top-3-accuracy: 0.8361 - val_accuracy: 0.3188 - val_loss: 3.5727 - val_top-3-accuracy: 0.5749 - learning_rate: 5.0000e-04
Epoch 23/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.5954 - loss: 2.2203 - top-3-accuracy: 0.8496 - val_accuracy: 0.2970 - val_loss: 3.8196 - val_top-3-accuracy: 0.5368 - learning_rate: 5.0000e-04
Epoch 24/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.6168 - loss: 2.1541 - top-3-accuracy: 0.8618 - val_accuracy: 0.3106 - val_loss: 4.0954 - val_top-3-accuracy: 0.5559 - learning_rate: 5.0000e-04
Epoch 25/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.6363 - loss: 2.0852 - top-3-accuracy: 0.8816 - val_accuracy: 0.2970 - val_loss: 3.8685 - val_top-3-accuracy: 0.5477 - learning_rate: 5.0000e-04
Epoch 26/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.6532 - loss: 2.0709 - top-3-accuracy: 0.8904 - val_accuracy: 0.3106 - val_loss: 3.9825 - val_top-3-accuracy: 0.5531 - learning_rate: 5.0000e-04
Epoch 27/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 5s 47ms/step - accuracy: 0.6641 - loss: 2.0262 - top-3-accuracy: 0.8948 - val_accuracy: 0.3297 - val_loss: 4.0754 - val_top-3-accuracy: 0.5640 - learning_rate: 5.0000e-04
Epoch 28/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 42ms/step - accuracy: 0.7155 - loss: 1.8955 - top-3-accuracy: 0.9198 - val_accuracy: 0.3351 - val_loss: 3.7763 - val_top-3-accuracy: 0.5613 - learning_rate: 2.5000e-04
Epoch 29/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 39ms/step - accuracy: 0.7817 - loss: 1.7047 - top-3-accuracy: 0.9452 - val_accuracy: 0.3515 - val_loss: 3.7834 - val_top-3-accuracy: 0.5749 - learning_rate: 2.5000e-04
Epoch 30/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step - accuracy: 0.8088 - loss: 1.5744 - top-3-accuracy: 0.9649 - val_accuracy: 0.3324 - val_loss: 3.8624 - val_top-3-accuracy: 0.5804 - learning_rate: 2.5000e-04
Epoch 31/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.8423 - loss: 1.4693 - top-3-accuracy: 0.9764 - val_accuracy: 0.3488 - val_loss: 3.8149 - val_top-3-accuracy: 0.5940 - learning_rate: 2.5000e-04
Epoch 32/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 41ms/step - accuracy: 0.8388 - loss: 1.4395 - top-3-accuracy: 0.9702 - val_accuracy: 0.3215 - val_loss: 3.9951 - val_top-3-accuracy: 0.5886 - learning_rate: 2.5000e-04
In [25]:
# Evaluate the model
plot_training_history(cnn_model_history, "CNN based Training and Validation")
val_loss, val_acc, val_top_3_acc = cnn_model.evaluate(val_ds)
print(f"Val acc = {val_acc}, val loss = {val_loss}, val_top_3_acc = {val_top_3_acc}")
results_df.append({"model_type": "CNN", "best_training_accuracy": max(cnn_model_history.history['accuracy']),
                   "validation_accuracy": val_acc, "validation_top_3":val_top_3_acc})
No description has been provided for this image
12/12 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.3246 - loss: 3.4849 - top-3-accuracy: 0.5849
Val acc = 0.31880107522010803, val loss = 3.5727267265319824, val_top_3_acc = 0.5749318599700928

Efficient net model¶

EfficientNet is a family of CNN architectures that achieves impressive performance while maintaining high efficiency through a clever compound scaling method. Instead of arbitrarily scaling network depth, width, or resolution, EfficientNet uniformly scales all three dimensions using a fixed set of scaling coefficients. This principled approach, derived through a neural architecture search, allows EfficientNet models to achieve a superior balance between accuracy and computational cost across various scales. This means you can get better performance with fewer resources by consistently increasing the network's capacity in a balanced way, making it a highly practical choice for many computer vision tasks.

In [26]:
def build_efficientnet_model(input_shape, num_classes):
    """
    Build EfficientNetB0 model with augmentation and custom head
    """
    # Base model (pre-trained model)
    base_model = EfficientNetB4(
        include_top=False,
        weights='imagenet',
        input_shape=input_shape,
        pooling='avg'
    )
    base_model.trainable = False  # Freeze base model 

    # Build full model with augmentation
    inputs = layers.Input(shape=input_shape)
    x = data_augmentation(inputs)
    x = tf.keras.applications.efficientnet.preprocess_input(x)
    x = base_model(x, training=False)
    x = layers.Dropout(DROPOUT_RATE)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    model = tf.keras.Model(inputs, outputs)

    return model
In [27]:
def train_efficientnet_model(model, epochs):
    optimizer = keras.optimizers.AdamW(
    learning_rate=LEARNING_RATE, weight_decay=WEIGHT_DECAY)
    model.compile(optimizer=optimizer,
                  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=[
                      keras.metrics.SparseCategoricalAccuracy(name="accuracy"),
                      keras.metrics.SparseTopKCategoricalAccuracy(3, name="top-3-accuracy"),])
    
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=epochs,
        callbacks=[early_stop_callback, reduce_lr_callback]
        )
    
    return history
In [28]:
efficientnet_model = build_efficientnet_model(input_shape=(IMG_HEIGHT, IMG_WIDTH, CHANNELS), num_classes=NUM_CLASSES)
In [29]:
efficientnet_history = train_efficientnet_model(efficientnet_model, 100)
Epoch 1/100
E0000 00:00:1751880666.766053   37962 meta_optimizer.cc:967] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inStatefulPartitionedCall/functional_4_1/efficientnetb4_1/block1b_drop_1/stateless_dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
104/104 ━━━━━━━━━━━━━━━━━━━━ 31s 135ms/step - accuracy: 0.0580 - loss: 3.1308 - top-3-accuracy: 0.1722 - val_accuracy: 0.0545 - val_loss: 3.0259 - val_top-3-accuracy: 0.1689 - learning_rate: 0.0010
Epoch 2/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 8s 77ms/step - accuracy: 0.0623 - loss: 3.1134 - top-3-accuracy: 0.1732 - val_accuracy: 0.0490 - val_loss: 3.0327 - val_top-3-accuracy: 0.1689 - learning_rate: 0.0010
Epoch 3/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 71ms/step - accuracy: 0.0489 - loss: 3.1173 - top-3-accuracy: 0.1622 - val_accuracy: 0.0627 - val_loss: 3.0204 - val_top-3-accuracy: 0.1744 - learning_rate: 0.0010
Epoch 4/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 69ms/step - accuracy: 0.0618 - loss: 3.1106 - top-3-accuracy: 0.1638 - val_accuracy: 0.0545 - val_loss: 3.0271 - val_top-3-accuracy: 0.1853 - learning_rate: 0.0010
Epoch 5/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 67ms/step - accuracy: 0.0560 - loss: 3.0913 - top-3-accuracy: 0.1720 - val_accuracy: 0.0572 - val_loss: 3.0333 - val_top-3-accuracy: 0.1798 - learning_rate: 0.0010
Epoch 6/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 66ms/step - accuracy: 0.0582 - loss: 3.0997 - top-3-accuracy: 0.1741 - val_accuracy: 0.0545 - val_loss: 3.0174 - val_top-3-accuracy: 0.1989 - learning_rate: 0.0010
Epoch 7/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 68ms/step - accuracy: 0.0463 - loss: 3.1125 - top-3-accuracy: 0.1575 - val_accuracy: 0.0572 - val_loss: 3.0289 - val_top-3-accuracy: 0.1853 - learning_rate: 0.0010
Epoch 8/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 63ms/step - accuracy: 0.0583 - loss: 3.1038 - top-3-accuracy: 0.1649 - val_accuracy: 0.0654 - val_loss: 3.0209 - val_top-3-accuracy: 0.1880 - learning_rate: 0.0010
Epoch 9/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 63ms/step - accuracy: 0.0647 - loss: 3.0980 - top-3-accuracy: 0.1758 - val_accuracy: 0.0599 - val_loss: 3.0312 - val_top-3-accuracy: 0.1853 - learning_rate: 0.0010
Epoch 10/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 63ms/step - accuracy: 0.0611 - loss: 3.0964 - top-3-accuracy: 0.1719 - val_accuracy: 0.0599 - val_loss: 3.0284 - val_top-3-accuracy: 0.1907 - learning_rate: 0.0010
Epoch 11/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 67ms/step - accuracy: 0.0657 - loss: 3.0940 - top-3-accuracy: 0.1884 - val_accuracy: 0.0627 - val_loss: 3.0343 - val_top-3-accuracy: 0.1853 - learning_rate: 0.0010
Epoch 12/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 8s 74ms/step - accuracy: 0.0562 - loss: 3.0916 - top-3-accuracy: 0.1595 - val_accuracy: 0.0599 - val_loss: 3.0141 - val_top-3-accuracy: 0.2044 - learning_rate: 5.0000e-04
Epoch 13/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0584 - loss: 3.0818 - top-3-accuracy: 0.1631 - val_accuracy: 0.0681 - val_loss: 3.0122 - val_top-3-accuracy: 0.2016 - learning_rate: 5.0000e-04
Epoch 14/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 70ms/step - accuracy: 0.0702 - loss: 3.0626 - top-3-accuracy: 0.1798 - val_accuracy: 0.0681 - val_loss: 3.0109 - val_top-3-accuracy: 0.2044 - learning_rate: 5.0000e-04
Epoch 15/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 63ms/step - accuracy: 0.0622 - loss: 3.0584 - top-3-accuracy: 0.1767 - val_accuracy: 0.0654 - val_loss: 3.0124 - val_top-3-accuracy: 0.2016 - learning_rate: 5.0000e-04
Epoch 16/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 69ms/step - accuracy: 0.0631 - loss: 3.0591 - top-3-accuracy: 0.1782 - val_accuracy: 0.0599 - val_loss: 3.0173 - val_top-3-accuracy: 0.1880 - learning_rate: 5.0000e-04
Epoch 17/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0617 - loss: 3.0601 - top-3-accuracy: 0.1773 - val_accuracy: 0.0627 - val_loss: 3.0153 - val_top-3-accuracy: 0.2044 - learning_rate: 5.0000e-04
Epoch 18/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0640 - loss: 3.0608 - top-3-accuracy: 0.1733 - val_accuracy: 0.0627 - val_loss: 3.0121 - val_top-3-accuracy: 0.2071 - learning_rate: 5.0000e-04
Epoch 19/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 62ms/step - accuracy: 0.0573 - loss: 3.0840 - top-3-accuracy: 0.1722 - val_accuracy: 0.0627 - val_loss: 3.0107 - val_top-3-accuracy: 0.1826 - learning_rate: 5.0000e-04
Epoch 20/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 62ms/step - accuracy: 0.0675 - loss: 3.0386 - top-3-accuracy: 0.1878 - val_accuracy: 0.0627 - val_loss: 3.0181 - val_top-3-accuracy: 0.2071 - learning_rate: 5.0000e-04
Epoch 21/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 71ms/step - accuracy: 0.0614 - loss: 3.0450 - top-3-accuracy: 0.1792 - val_accuracy: 0.0627 - val_loss: 3.0135 - val_top-3-accuracy: 0.2071 - learning_rate: 5.0000e-04
Epoch 22/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 61ms/step - accuracy: 0.0659 - loss: 3.0588 - top-3-accuracy: 0.1746 - val_accuracy: 0.0627 - val_loss: 3.0183 - val_top-3-accuracy: 0.1935 - learning_rate: 5.0000e-04
Epoch 23/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 60ms/step - accuracy: 0.0672 - loss: 3.0509 - top-3-accuracy: 0.1870 - val_accuracy: 0.0627 - val_loss: 3.0188 - val_top-3-accuracy: 0.2044 - learning_rate: 5.0000e-04
Epoch 24/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 62ms/step - accuracy: 0.0532 - loss: 3.0557 - top-3-accuracy: 0.1667 - val_accuracy: 0.0627 - val_loss: 3.0187 - val_top-3-accuracy: 0.2016 - learning_rate: 5.0000e-04
Epoch 25/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0577 - loss: 3.0624 - top-3-accuracy: 0.1708 - val_accuracy: 0.0654 - val_loss: 2.9979 - val_top-3-accuracy: 0.1962 - learning_rate: 2.5000e-04
Epoch 26/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 8s 73ms/step - accuracy: 0.0597 - loss: 3.0279 - top-3-accuracy: 0.1728 - val_accuracy: 0.0654 - val_loss: 2.9975 - val_top-3-accuracy: 0.1989 - learning_rate: 2.5000e-04
Epoch 27/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 60ms/step - accuracy: 0.0654 - loss: 3.0267 - top-3-accuracy: 0.1754 - val_accuracy: 0.0627 - val_loss: 2.9982 - val_top-3-accuracy: 0.2044 - learning_rate: 2.5000e-04
Epoch 28/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 67ms/step - accuracy: 0.0716 - loss: 3.0184 - top-3-accuracy: 0.1884 - val_accuracy: 0.0681 - val_loss: 3.0012 - val_top-3-accuracy: 0.1962 - learning_rate: 2.5000e-04
Epoch 29/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 62ms/step - accuracy: 0.0726 - loss: 3.0057 - top-3-accuracy: 0.1872 - val_accuracy: 0.0681 - val_loss: 2.9998 - val_top-3-accuracy: 0.1880 - learning_rate: 2.5000e-04
Epoch 30/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 66ms/step - accuracy: 0.0718 - loss: 3.0169 - top-3-accuracy: 0.1940 - val_accuracy: 0.0736 - val_loss: 3.0000 - val_top-3-accuracy: 0.2098 - learning_rate: 2.5000e-04
Epoch 31/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 70ms/step - accuracy: 0.0611 - loss: 3.0302 - top-3-accuracy: 0.1818 - val_accuracy: 0.0654 - val_loss: 2.9970 - val_top-3-accuracy: 0.2016 - learning_rate: 2.5000e-04
Epoch 32/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 59ms/step - accuracy: 0.0647 - loss: 3.0342 - top-3-accuracy: 0.1828 - val_accuracy: 0.0654 - val_loss: 3.0008 - val_top-3-accuracy: 0.1880 - learning_rate: 2.5000e-04
Epoch 33/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 59ms/step - accuracy: 0.0618 - loss: 3.0258 - top-3-accuracy: 0.1836 - val_accuracy: 0.0654 - val_loss: 3.0021 - val_top-3-accuracy: 0.1935 - learning_rate: 2.5000e-04
Epoch 34/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 62ms/step - accuracy: 0.0651 - loss: 3.0184 - top-3-accuracy: 0.1845 - val_accuracy: 0.0572 - val_loss: 2.9972 - val_top-3-accuracy: 0.1935 - learning_rate: 2.5000e-04
Epoch 35/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 66ms/step - accuracy: 0.0613 - loss: 3.0293 - top-3-accuracy: 0.1750 - val_accuracy: 0.0627 - val_loss: 2.9993 - val_top-3-accuracy: 0.1962 - learning_rate: 2.5000e-04
Epoch 36/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 72ms/step - accuracy: 0.0656 - loss: 3.0284 - top-3-accuracy: 0.1871 - val_accuracy: 0.0654 - val_loss: 2.9923 - val_top-3-accuracy: 0.2125 - learning_rate: 2.5000e-04
Epoch 37/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0674 - loss: 3.0287 - top-3-accuracy: 0.1863 - val_accuracy: 0.0708 - val_loss: 2.9979 - val_top-3-accuracy: 0.2098 - learning_rate: 2.5000e-04
Epoch 38/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 62ms/step - accuracy: 0.0671 - loss: 3.0233 - top-3-accuracy: 0.1747 - val_accuracy: 0.0654 - val_loss: 2.9933 - val_top-3-accuracy: 0.2044 - learning_rate: 2.5000e-04
Epoch 39/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 65ms/step - accuracy: 0.0599 - loss: 3.0205 - top-3-accuracy: 0.1817 - val_accuracy: 0.0654 - val_loss: 2.9987 - val_top-3-accuracy: 0.1935 - learning_rate: 2.5000e-04
Epoch 40/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 66ms/step - accuracy: 0.0724 - loss: 3.0080 - top-3-accuracy: 0.1998 - val_accuracy: 0.0654 - val_loss: 2.9967 - val_top-3-accuracy: 0.2044 - learning_rate: 2.5000e-04
Epoch 41/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 67ms/step - accuracy: 0.0640 - loss: 3.0126 - top-3-accuracy: 0.1788 - val_accuracy: 0.0736 - val_loss: 2.9985 - val_top-3-accuracy: 0.2071 - learning_rate: 2.5000e-04
Epoch 42/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 60ms/step - accuracy: 0.0567 - loss: 3.0361 - top-3-accuracy: 0.1727 - val_accuracy: 0.0490 - val_loss: 3.0001 - val_top-3-accuracy: 0.1608 - learning_rate: 1.2500e-04
Epoch 43/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0687 - loss: 3.0184 - top-3-accuracy: 0.1863 - val_accuracy: 0.0463 - val_loss: 3.0000 - val_top-3-accuracy: 0.1471 - learning_rate: 1.2500e-04
Epoch 44/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 6s 59ms/step - accuracy: 0.0746 - loss: 3.0003 - top-3-accuracy: 0.2006 - val_accuracy: 0.0490 - val_loss: 2.9987 - val_top-3-accuracy: 0.1635 - learning_rate: 1.2500e-04
Epoch 45/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 64ms/step - accuracy: 0.0781 - loss: 3.0039 - top-3-accuracy: 0.1933 - val_accuracy: 0.0545 - val_loss: 2.9959 - val_top-3-accuracy: 0.1608 - learning_rate: 1.2500e-04
Epoch 46/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 7s 71ms/step - accuracy: 0.0822 - loss: 3.0022 - top-3-accuracy: 0.2034 - val_accuracy: 0.0518 - val_loss: 2.9988 - val_top-3-accuracy: 0.1471 - learning_rate: 1.2500e-04
In [30]:
plot_training_history(efficientnet_history, "Efficient net Training and Validation")
val_loss, val_acc, val_top_3_acc = efficientnet_model.evaluate(val_ds)
print(f"Val acc = {val_acc}, val loss = {val_loss}, val_top_3_acc = {val_top_3_acc}")
results_df.append({"model_type": "efficient net", "best_training_accuracy": max(efficientnet_history.history['accuracy']), 
                   "validation_accuracy": val_acc, "validation_top_3":val_top_3_acc})
No description has been provided for this image
12/12 ━━━━━━━━━━━━━━━━━━━━ 1s 43ms/step - accuracy: 0.0633 - loss: 2.9894 - top-3-accuracy: 0.2167
Val acc = 0.06539509445428848, val loss = 2.9922616481781006, val_top_3_acc = 0.21253405511379242

Resnet 50¶

ResNet-50 is another CNN architecture that revolutionized deep learning by effectively addressing the vanishing gradient problem in very deep networks. It achieves this primarily through the ingenious use of "residual connections" or "skip connections," which allow the input from a previous layer to be directly added to the output of a later layer, bypassing one or more convolutional layers. This enables the training of much deeper networks (hence "ResNet" for Residual Network) without degradation in performance, as the shortcut paths ensure that gradients can flow more easily through the network during backpropagation.

In [31]:
def build_resnet_model(input_shape, num_classes):
    """
    Build Resnet model
    """
    # Base model (pre-trained model)
    base_model = ResNet50V2(
        include_top=False,
        weights='imagenet',
        input_shape=input_shape,
        pooling='avg'
    )
    base_model.trainable = False  # Freeze base model 

    # Build full model with augmentation
    inputs = layers.Input(shape=input_shape)
    x = data_augmentation(inputs)
    x = tf.keras.applications.efficientnet.preprocess_input(x)
    x = base_model(x, training=False)
    x = layers.Dropout(DROPOUT_RATE)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    model = tf.keras.Model(inputs, outputs)

    return model
In [32]:
def train_resnet_model(model, epochs):
    optimizer = keras.optimizers.AdamW(
    learning_rate=LEARNING_RATE, weight_decay=WEIGHT_DECAY)
    model.compile(optimizer=optimizer,
                  loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=[
                      keras.metrics.SparseCategoricalAccuracy(name="accuracy"),
                      keras.metrics.SparseTopKCategoricalAccuracy(3, name="top-3-accuracy"),])
    
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=epochs,
        callbacks=[early_stop_callback, reduce_lr_callback]
        )
    
    return history
In [33]:
resnet_model = build_resnet_model(input_shape=(IMG_HEIGHT, IMG_WIDTH, CHANNELS), num_classes=NUM_CLASSES)
resnet_history = train_resnet_model(resnet_model, 100)
Epoch 1/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 15s 70ms/step - accuracy: 0.0683 - loss: 3.3073 - top-3-accuracy: 0.1790 - val_accuracy: 0.0954 - val_loss: 3.2474 - val_top-3-accuracy: 0.2207 - learning_rate: 0.0010
Epoch 2/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 38ms/step - accuracy: 0.1024 - loss: 3.0910 - top-3-accuracy: 0.2253 - val_accuracy: 0.1144 - val_loss: 3.2737 - val_top-3-accuracy: 0.2289 - learning_rate: 0.0010
Epoch 3/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 36ms/step - accuracy: 0.1036 - loss: 2.9882 - top-3-accuracy: 0.2555 - val_accuracy: 0.1117 - val_loss: 3.2934 - val_top-3-accuracy: 0.2452 - learning_rate: 0.0010
Epoch 4/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 43ms/step - accuracy: 0.1188 - loss: 2.9695 - top-3-accuracy: 0.2677 - val_accuracy: 0.1199 - val_loss: 3.2632 - val_top-3-accuracy: 0.2643 - learning_rate: 0.0010
Epoch 5/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 38ms/step - accuracy: 0.1187 - loss: 2.9380 - top-3-accuracy: 0.2819 - val_accuracy: 0.1390 - val_loss: 3.2934 - val_top-3-accuracy: 0.2643 - learning_rate: 0.0010
Epoch 6/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 37ms/step - accuracy: 0.1104 - loss: 2.9809 - top-3-accuracy: 0.2660 - val_accuracy: 0.1063 - val_loss: 3.2994 - val_top-3-accuracy: 0.2589 - learning_rate: 0.0010
Epoch 7/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 37ms/step - accuracy: 0.1163 - loss: 2.9420 - top-3-accuracy: 0.2743 - val_accuracy: 0.1144 - val_loss: 3.3799 - val_top-3-accuracy: 0.2698 - learning_rate: 5.0000e-04
Epoch 8/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 37ms/step - accuracy: 0.1316 - loss: 2.8929 - top-3-accuracy: 0.2909 - val_accuracy: 0.1172 - val_loss: 3.3686 - val_top-3-accuracy: 0.2725 - learning_rate: 5.0000e-04
Epoch 9/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 36ms/step - accuracy: 0.1296 - loss: 2.8450 - top-3-accuracy: 0.2987 - val_accuracy: 0.1144 - val_loss: 3.3989 - val_top-3-accuracy: 0.2507 - learning_rate: 5.0000e-04
Epoch 10/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 37ms/step - accuracy: 0.1408 - loss: 2.8582 - top-3-accuracy: 0.3053 - val_accuracy: 0.1063 - val_loss: 3.3935 - val_top-3-accuracy: 0.2698 - learning_rate: 5.0000e-04
Epoch 11/100
104/104 ━━━━━━━━━━━━━━━━━━━━ 4s 37ms/step - accuracy: 0.1203 - loss: 2.8751 - top-3-accuracy: 0.3187 - val_accuracy: 0.1144 - val_loss: 3.3510 - val_top-3-accuracy: 0.2834 - learning_rate: 5.0000e-04
In [34]:
plot_training_history(resnet_history, "Resnet Training and Validation")
val_loss, val_acc, val_top_3_acc = resnet_model.evaluate(val_ds)
print(f"Val acc = {val_acc}, val loss = {val_loss}, val_top_3_acc = {val_top_3_acc}")
results_df.append({"model_type": "resnet", "best_training_accuracy": max(resnet_history.history['accuracy']), 
                   "validation_accuracy": val_acc, "validation_top_3":val_top_3_acc})
No description has been provided for this image
12/12 ━━━━━━━━━━━━━━━━━━━━ 0s 23ms/step - accuracy: 0.1030 - loss: 3.1576 - top-3-accuracy: 0.2351
Val acc = 0.09536784887313843, val loss = 3.247427463531494, val_top_3_acc = 0.2207084447145462

Results¶

In [35]:
#combining results
pd.DataFrame(results_df)
Out[35]:
model_type best_training_accuracy validation_accuracy validation_top_3
0 vanilla_vit 0.200786 0.119891 0.294278
1 self_attention_vit 0.257635 0.226158 0.474114
2 CNN 0.855458 0.318801 0.574932
3 efficient net 0.075900 0.065395 0.212534
4 resnet 0.143332 0.095368 0.220708

Discussion/Conclusion¶

Based on observed results the CNN based model significantly outperformed other architectures, achieving the highest best training accuracy of 0.857877, validation accuracy of 0.318801, and validation top-3 accuracy of 0.591281. The self-attention ViT model showed the second-best performance across all metrics, suggesting the potential benefits of attention mechanisms in vision transformers, although it still lagged considerably behind the CNN. Conversely, the EfficientNet and vanilla ViT models demonstrated the lowest performance, indicating their current configurations or inherent architectures were less suitable for this specific image classification task compared to CNNs or self-attention ViTs. ResNet also underperformed relative to the CNN and self-attention ViT. These findings highlight the strong applicability of convolutional neural networks for this classification problem and suggest further optimization or different architectural choices may be needed for the transformer and other models to reach comparable performance.

One major reason, CNN outperformed others architectures could be the size of data. Each of 20 image labels had images from 150-200 in count. Vision Transformers are known to be "data hungry" and this excercise reinforces it.

For large, complex CNNs like ResNet and EfficientNet, while they are CNN based, their massive capacity (depth and number of parameters) can still lead to overfitting on very small datasets without careful regularization or transfer learning.